Optimizing Kernel Block Memory Operations
نویسندگان
چکیده
This paper investigates the performance of block memory operations in the operating system, including memory copies, page zeroing, interprocess communication, and networking. The performance of these common operating system operations is highly dependent on the cache state and future use pattern of the data, and no single routine maximizes performance in all situations. Current systems use a statically selected algorithm to perform block memory operations. This paper introduces a method to dynamically predict the optimal algorithm for each block memory operation. This dynamic selection involves the prediction of the current state of the cache as well as whether or not the target data will be reused before it is evicted. By using the results of these predictions to select the optimal software algorithm, the performance of kernel copy operations can be improved.
منابع مشابه
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY
Sparse matrix-vector multiplication is an important computational kernel that tends to perform poorly on modern processors, largely because of its high ratio of memory operations to arithmetic operations. Optimizing this algorithm is difficult, both because of the complexity of memory systems and because the performance is highly dependent on the nonzero structure of the matrix. The Sparsity sy...
متن کاملCharacterization of Block Memory Operations
Block memory operations are frequently performed by the operating system and consume an increasing fraction of kernel execution time. These operations include memory copies, page zeroing, interprocess communication, and networking. This thesis demonstrates that performance of these common OS operations is highly dependent on the cache state and future use pattern of the data. This thesis argues...
متن کاملOptimizing Sparse Matrix-vector Multiplication Based on Gpu
In recent years, Graphics Processing Units(GPUs) have attracted the attention of many application developers as powerful massively parallel system. Computer Unified Device Architecture (CUDA) as a general purpose parallel computing architecture makes GPUs an appealing choice to solve many complex computational problems in a more efficient way. Sparse Matrix-vector Multiplication(SpMV) algorithm...
متن کاملA Parallel Computational Kernel for Sparse Nonsymmetric Eigenvalue Problems on Multicomputers
The aim of this paper is to show an effective reorganization of the nonsymmetric block lanczos algorithm efficient, portable and scalable for multiple instructions multiple data (MIMD) distributed memory message passing architectures. Basic operations implemented here are matrix-matrix multiplications, eventually with a transposed and a sparse factor, LU factorisation and triangular systems sol...
متن کاملOptimizing Performance on Modern HPC Systems: Learning From Simple Kernel Benchmarks
We discuss basic optimization and parallelization strategies for current cache-based microprocessors (Intel Itanium2, Intel Netburst and AMD64 variants) in single-CPU and shared memory environments. Using selected kernel benchmarks representing data intensive applications we focus on the effective bandwidths attainable, which is still suboptimal using current compilers. We stress the need for a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006